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Abstract 

Long-run  average  properties  of  probabilistic  systems 
refer  to  the  average  behavior  of  the  system,  measured 
over  a  period  of  time  whose  length  diverges  to  infinity. 
These  properties  include  many  relevant  performance 
and  reliability  indices,  such  as  system  throughput,  av¬ 
erage  response  time,  and  mean  time  between  failures. 

In  this  paper,  we  argue  that  current  formal  spec¬ 
ification  methods  cannot  be  used  to  specify  long-run 
average  properties  of  probabilistic  systems.  To  enable 
the  specification  of  these  properties,  we  propose  an  ap¬ 
proach  based  on  the  concept  of  experiments.  Experi¬ 
ments  are  labeled  graphs  that  can  be  used  to  describe 
behavior  patterns  of  interest,  such  as  the  request  for 
a  resource  followed  by  either  a  grant  or  a  rejection. 
Experiments  are  meant  to  be  performed  infinitely  of¬ 
ten,  and  it  is  possible  to  specify  their  long-run  average 
outcome  or  duration. 

We  propose  simple  extensions  of  temporal  logics 
based  on  experiments,  and  we  present  model-checking 
algorithms  for  the  verification  of  properties  of  finite- 
state  timed  probabilistic  systems  in  which  both  proba¬ 
bilistic  and  nondeterministic  choice  are  present.  The 
consideration  of  system  models  that  include  nondeter¬ 
minism  enables  the  performance  and  reliability  anal¬ 
ysis  of  partially  specified  systems,  such  as  systems  in 
their  early  design  stages. 

1  Introduction 

Long-run  average  properties  of  probabilistic  sys¬ 
tems  include  many  classical  performance  and  reliabil¬ 
ity  indices,  such  as  system  throughput,  average  re¬ 
sponse  time,  and  mean  time  between  failures.  These 
properties  refer  to  the  average  behavior  of  the  system, 
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measured  over  a  period  of  time  whose  length  diverges 
to  infinity  [15].  In  systems  modeled  as  Markov  chains, 
long-run  average  properties  are  related  to  the  steady- 
state  distribution  of  the  chain  [20].  In  this  paper  we 
argue  that  current  approaches  to  formal  specification 
are  not  suited  for  the  study  of  long-run  average  prop¬ 
erties  of  probabilistic  systems,  and  we  present  verifi¬ 
cation  and  specification  methods  that  overcome  this 
limitation. 

Current  approaches  to  the  specification  and  veri¬ 
fication  of  probabilistic  systems  are  based  either  on 
extensions  of  temporal  logics,  or  on  probabilistic  pro¬ 
cess  algebras,  simulation  and  bisimulation  relations, 
and  testing  preorders. 

Temporal  logics  for  the  specification  of  quantita¬ 
tive  properties  of  probabilistic  systems  have  been  pre¬ 
sented  in  [18,  2,  7,  21,  13],  and  a  probabilistic  du¬ 
ration  calculus  has  been  studied  in  [24].  These  log¬ 
ics  enable  the  specification  of  bounds  for  the  proba¬ 
bility  of  satisfying  temporal  or  duration  calculus  for¬ 
mulas,  starting  from  given  subsets  of  system  states. 
Model-checking  algorithms  for  these  logics  have  been 
presented  in  [10,  11,  7,  21,  13].  These  logics  can  be 
used  to  specify  many  properties  of  interest,  such  as 
bounds  on  the  probability  of  meeting  a  deadline,  or 
of  reaching  a  deadlock,  starting  from  a  given  set  of 
states.  These  properties  are  related  to  ensemble  aver¬ 
ages  (or  probabilities)  over  the  set  of  behaviors  that 
originate  from  single  system  states. 

Long-run  average  properties  of  systems  are  related 
to  time  averages  along  system  behaviors,  rather  than 
ensemble  averages.  This  indicates  that  the  above  log¬ 
ics  cannot  be  used  to  specify  long-run  average  proper¬ 
ties  of  systems.  In  fact,  even  in  the  case  of  purely  prob¬ 
abilistic  systems,  these  logics  cannot  take  into  account 
the  long-run  average  probability  of  being  at  given  sys¬ 
tem  states,  or  the  long-run  average  outcome  of  system 
choices.  This  limits  their  ability  to  capture  a  large 
number  of  classical  performance  and  reliability  prop¬ 
erties. 

Another  approach  to  the  specification  of  probabilis¬ 
tic  systems  is  based  on  the  use  of  probabilistic  pro- 
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cess  algebras,  simulation  and  bisimulation  relations, 
and  testing  preordres  (see  for  instance  [23,  9,  29],  or 
[28,  12]  for  more  comprehensive  summaries  of  this  ap¬ 
proach).  Simulation  and  bisimulation  relations  pre¬ 
serve  the  long-run  average  behavior  of  probabilistic 
systems,  but  they  offer  no  direct  method  for  the  spec¬ 
ification  of  performance  and  reliability  properties  con¬ 
nected  to  this  type  of  behavior.  The  approaches  based 
on  tests  enable  to  quantify  the  probability  with  which 
the  system  passes  the  tests.  Since  the  tests  are  gener¬ 
ally  executed  only  once,  they  cannot  be  used  to  mea¬ 
sure  long-run  average  properties. 

This  paper  presents  a  method  for  the  formal  spec¬ 
ification  and  verification  of  long-run  average  proper¬ 
ties  of  probabilistic  systems.  The  method  is  based  on 
the  concept  of  experiments,  inspired  by  the  tests  of 
process  algebra.  Experiments  are  labeled  graphs  that 
can  be  used  to  describe  behavior  patterns  of  interest, 
such  as  the  request  for  a  resource  followed  by  either 
a  grant  or  a  rejection.  Experiments  associate  with 
each  occurrence  of  these  patterns  an  outcome:  a  real 
number  representing  the  success  or  the  duration  of  the 
pattern.  For  example,  we  can  associate  to  the  above 
experiment  outcome  1  if  the  request  is  granted,  and  0 
if  it  is  rejected.  Unlike  tests,  experiments  are  meant 
to  be  performed  infinitely  often,  and  it  is  possible  to 
measure  their  long-run  average  outcome.  The  long- 
run  average  outcome  of  the  above  experiment  is  equal 
to  the  long-run  average  fraction  of  requests  that  are 
granted.  Experiments  provide  a  specification  method 
for  long-run  average  properties  that  is  semantically 
sound,  and  easy  to  understand,  even  when  the  system 
model  includes  nondeterminism. 

We  model  the  behavior  of  probabilistic  systems  in 
terms  of  probability,  nondeterminism,  and  time:  our 
models  are  based  on  Markov  decision  processes,  aug¬ 
mented  by  additional  information  on  the  timing  be¬ 
havior  of  the  system.  The  inclusion  of  nondeterminism 
in  the  system  model  enables  the  study  of  performance 
and  reliability  of  partially  specified  systems,  such  as 
systems  in  their  early  design  stages. 

We  propose  simple  extensions  of  branching-time 
temporal  logics  based  on  experiments.  These  exten¬ 
sions  are  obtained  by  introducing  new  operators  that 
enable  to  express  bounds  on  the  long-run  average  out¬ 
come  of  experiments.  These  extensions  enable  the  use 
of  a  single  language  for  the  specification  of  correctness, 
reliability,  and  performance  properties  of  systems.  We 
also  discuss  the  relationship  between  the  long-run  av¬ 
erage  properties  studied  in  this  paper  and  the  proper¬ 
ties  expressible  in  previous  probabilistic  extensions  of 
temporal  logic. 


Finally,  we  present  model-checking  algorithms  for 
the  verification  of  long-run  average  properties  of  finite- 
state  systems.  The  model-checking  algorithms  are 
based  on  new  results  from  the  theory  of  Markov 
decision  processes.  They  have  polynomial  time- 
complexity  both  in  the  size  of  the  system  and  in  the 
size  of  the  experiment,  indicating  that  this  proposal 
provides  a  practical  approach  to  the  formal  specifica¬ 
tion  and  verification  of  long-run  average  properties  of 
systems. 

Formal  Methods  for  Performance  Modeling 

To  complete  our  review  of  related  work,  we  mention 
the  use  of  formal  methods  for  the  construction  of  per¬ 
formance  models  of  systems.  While  this  approach  does 
not  deal  with  the  issue  of  specification  languages,  it 
nevertheless  provides  methods  for  measuring  several 
performance  and  reliability  indices  of  purely  proba¬ 
bilistic  systems  (i.e.  systems  not  containing  nondeter- 
ministic  choice). 

A  popular  approach  to  the  construction  of  per¬ 
formance  models  relies  on  probabilistic  extensions 
of  Petri  nets,  such  as  the  stochastic  Petri  nets  of 
[26,  25,  30]  and  the  generalized  stochastic  Petri  nets 
(GSPNs)  of  [1].  The  transitions  of  a  GSPN  can  either 
fire  immediately,  or  fire  with  an  exponential  delay  dis¬ 
tribution.  GSPNs  can  be  translated  into  continuous¬ 
time  Markov  chains,  thus  enabling  the  performance 
analysis  of  the  systems. 

Another  approach  to  the  compositional  modeling  of 
probabilistic  systems  is  based  on  extensions  of  process 
algebras.  The  process  algebras  MPA  and  EMPA  as¬ 
sociate  delay  distributions  with  the  actions  [5].  An 
EMPA  system  model  can  be  either  translated  into 
a  GSPN,  or  directly  into  a  continuous-time  Markov 
chain,  thus  allowing  the  performance  evaluation  of  the 
system  [4,  3].  The  idea  of  associating  delay  distribu¬ 
tions  with  the  actions  is  also  at  the  basis  of  the  process 
algebras  PEPA  [19]  and  TIPP  [16].  System  models 
written  in  PEPA  and  in  subsets  of  TIPP  can  again  be 
translated  into  continuous-time  Markov  chains. 

These  formalisms  enable  the  performance  model¬ 
ing  only  of  purely  probabilistic  systems:  nondetermin¬ 
ism  can  be  present  in  system  sub-components,  but 
not  in  the  complete  model  that  is  translated  into  a 
continuous-time  Markov  chain. 

The  performance  and  reliability  quantities  of  in¬ 
terest  can  be  measured  by  adding  annotations  to  the 
models.  In  a  GSPN  reward  model,  a  reward  rate  is  as¬ 
sociated  with  each  place  and  transition  of  the  net  [8]; 
in  PEPA,  a  reward  rate  can  be  associated  with  each 
action.  The  average  reward  per  unit  of  time  can  then 
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be  computed  by  solving  the  continuous-time  Markov 
chains  obtained  from  the  systems. 

The  experiments  we  introduce  here  also  provide 
a  flexible  way  of  associating  rewards  with  a  system. 
Additionally,  using  experiments  we  can  measure  not 
only  the  amount  of  reward  per  unit  of  time,  but  also 
the  amount  of  reward  per  experiment.  For  systems 
that  can  be  translated  into  ergodic  Markov  chains  [20] , 
the  long-run  average  outcome  per  experiment  can  be 
computed  by  computing  separately  the  rates  of  out¬ 
come  generation  and  of  experiment  completion,  and  by 
taking  the  ratio  between  the  two.  However,  in  the  case 
of  systems  with  nondeterminism  this  approach  fails, 
and  the  introduction  of  experiments  leads  to  more  ex¬ 
pressive  specification  methods. 

2  Models  for  Probabilistic  Systems 

Our  models  for  probabilistic  systems  are  based  on 
Markov  decision  processes  (MDPs).  An  MDP  is  a  gen¬ 
eralization  of  a  Markov  chain  in  which  a  set  of  possible 
actions  is  associated  with  each  state.  To  each  state- 
action  pair  corresponds  a  probability  distribution  on 
the  states,  which  is  used  to  select  the  successor  state 
[14,  6], 

Definition  1  (Markov  decision  process)  A 

Markov  decision  process  (MDP)  ( S,A,p )  consists  of 
a  finite  set  5  of  states,  and  of  two  components  A,  p 
that  specify  the  transition  structure: 

•  For  each  s  £  S,  A(s)  is  a  non-empty  finite  set  of 
actions  available  at  s. 

•  For  each  s,t  £  S  and  a  £  A(s),  pst(a )  is  the 
probability  of  a  transition  from  s  to  t  when  action 
a  is  selected.  For  every  s,t  £  S  and  a  £  A(s),  it 
is  0  <  Pst  (a)  <  1  and  J2tesPst(a)  =  L  1 

We  will  often  associate  with  an  MDP  additional  la¬ 
belings  to  represent  quantities  of  interest;  the  labelings 
will  be  simply  added  to  the  list  of  components. 

A  behavior  of  an  MDP  is  an  infinite  sequence  of 
alternating  states  and  actions,  constructed  by  iterat¬ 
ing  a  two-phase  selection  process.  First,  given  the 
current  state  s,  an  action  a  £  A{s)  is  selected  non- 
deterministically;  second,  the  successor  state  1  of  s 
is  chosen  according  to  the  probability  distribution 
Pr (t  |  s,  a)  =  pst(a).  The  formal  definition  of  behavior 
is  as  follows. 

Definition  2  (behaviors  of  MDP)  A  behavior  of 
an  MDP  II  is  an  infinite  sequence  uj  :  SoaoSiai  ■  ■  ■ 
such  that  Si  £  S,  o*  £  A(si )  and  pSi,si+1{(H)  >  0  for 


all  i  >  0.  We  let  X% ,  Y%  be  the  random  variables  rep¬ 
resenting  the  i-th  state  and  the  i-th  action  along  a 
behavior,  respectively.  Formally,  Xi  and  Y*  are  vari¬ 
ables  that  assume  the  value  Si,  a*  on  the  behavior 
l o  :  So^o^idi  ■  ■  ■•  I 

Policies  and  probability  of  behaviors.  For  every 
state  s  £  S,  we  denote  by  the  set  of  behaviors  start¬ 
ing  from  s,  and  we  let  Bs  C  2°s  be  the  u-algebra  of 
measurable  subsets  of  fls,  following  the  classical  defini¬ 
tion  of  [20].  To  be  able  to  talk  about  the  probability  of 
system  behaviors,  we  need  to  specify  the  criteria  with 
which  the  actions  are  chosen.  To  this  end,  we  use  the 
concept  of  policy  [14] ,  closely  related  to  the  adversaries 
of  [29,  28]  and  to  the  schedulers  of  [31,  27]. 

A  policy  p  is  a  set  of  conditional  probabilities 
Qv(a  |  s0si  •  •  •  sn),  for  all  n  >  0,  s0,  s\, . sn  £  S 
and  a  £  A(sn).  According  to  policy  p,  after  the  finite 
prefix  soaosi  •  •  •  sn,  action  a  £  A(sn)  is  chosen  with 
probability  Qv(a  |  sosi  ■••sn).  Hence,  under  policy 
p  the  probability  of  following  a  finite  behavior  prefix 

SoUo-SlUl  '  '  '  Sn  I®  rL=0  Psi>si  + 1  (^h)  \  So 

These  probabilities  for  prefixes  give  rise  to  a  unique 
probability  measure  on  Bs.  We  write  Pr^(A)  to  denote 
the  probability  of  event  A  in  fls  under  policy  p,  and 
E !?{/}  to  denote  the  expectation  of  the  random  func¬ 
tion  /  from  state  s  under  policy  p. 

Timed  probabilistic  systems.  Our  model  for 
probabilistic  systems  is  that  of  timed  probabilistic  sys¬ 
tem  (TPSs).  A  TPS  is  an  MDP  with  three  additional 
labelings,  that  describe  the  set  of  initial  states,  the 
timing  properties  of  the  system,  and  the  values  of  a 
set  of  state  variables  at  all  system  states.  TPSs  are 
closely  related  to  semi-Markov  decision  processes  [6]. 

For  simplicity,  we  assume  a  fixed  set  V  of  state  vari¬ 
ables. 

Definition  3  (TPS)  A  TPS  ( S,A,p,Sin ,  time, I )  is 
an  MDP  ( S,A,p )  with  three  additional  components: 

•  A  subset  Sin  C  5  of  initial  states. 

•  A  labeling  time  that  associates  with  each  s  £ 
S  and  a  £  A(s)  the  expected  amount  of  time 
time(s,a)  £  1R+  spent  at  s  when  action  a  is  se¬ 
lected. 

•  A  labeling  X  that  associates  with  each  x  £  V  and 
s  £  S  the  value  Xs\x\  of  x  at  s.  I 

The  relative  simplicity  of  this  model  enables  us  to 
focus  our  attention  on  the  specification  and  verifica¬ 
tion  of  long-run  average  properties,  rather  than  on 
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modeling  issues.  It  is  possible  to  define  higher-level 
compositional  models  for  probabilistic  systems  that 
can  be  automatically  translated  into  TPSs;  an  exam¬ 
ple  of  such  higher-level  models  are  the  stochastic  tran¬ 
sition  systems  of  [12].  Several  other  models  can  be 
similarly  translated. 

We  say  that  time  diverges  along  a  behavior  iff 
^T=o  time(Xk,  Yk)  diverges.  Since  behaviors  along 
which  time  does  not  diverge  do  not  have  a  physical 
meaning,  we  make  the  following  assumption  about  the 
TPSs  under  consideration: 

Non-Zenoness  Assumption:  For  every  policy  rj 
and  state  s,  Pr^(^°^0  time(Xi,Yi)  =  oo)  =  1. 

This  assumption  can  be  verified  using  the  algorithm 
described  in  [13].  A  more  general  approach  to  the 
problem  of  time  divergence,  inspired  by  [28],  is  pre¬ 
sented  in  [12,  §8]. 

3  A  Motivational  Example 

To  gain  a  better  understanding  of  why  existing  for¬ 
mal  specification  methods  cannot  capture  long-run  av¬ 
erage  properties  of  systems,  we  present  a  simple  exam¬ 
ple:  the  specification  of  the  long-run  average  proba¬ 
bility  of  gaining  access  to  a  shared  resource  in  a  multi¬ 
user  system. 

Specifically,  we  consider  a  system  shared- res, 
consisting  of  N  users  that  can  access  a  shared  resource. 
The  resource  can  be  used  by  at  most  M  <  N  users  at 
any  single  time.  Each  user  can  be  in  one  of  three 
states:  idle,  requesting  and  using.  For  the  sake  of  sim¬ 
plicity,  we  assume  that  each  step  of  the  system  has 
unit  duration. 

Initially,  all  users  are  at  idle.  At  each  time  step,  if  a 
user  is  at  idle  it  has  probability  p  of  going  to  requesting 
and  1  —  p  of  staying  at  idle.  If  the  user  is  in  using,  it 
has  probability  q  of  going  to  idle,  and  1  —  q  of  staying 
in  using.  The  behavior  of  a  single  user  is  depicted  in 
Figure  1. 

Let  j  and  k,  with  k  <  M,  be  the  number  of  users 
in  requesting  and  using,  respectively,  at  the  beginning 
of  a  time  step.  The  scheduler  may  grant  access  to  any 
number  of  users  between  m  =  min{Mo,  M  —  k,j}  and 
n  =  min{M  —  k,  j},  where  Mo  >  0  is  a  constant.  Thus, 
from  the  state  there  are  actions  am,am+ 1, . . . ,  an.  If 
action  a;  is  chosen,  with  m  <  l  <n,  then  l  among  the 
j  users  at  requesting  are  selected  uniformly  at  random 
and  go  to  using,  while  the  remaining  j  —  l  users  are 
sent  back  to  idle.  From  this  informal  description,  for 
given  N,  M,  Mo,  p,  and  q,  it  is  possible  to  construct 
a  TPS  n (N,  M,M0,p,  g). 


Figure  1:  Behavior  of  a  single  user  in  system 
shared-res.  The  transitions  from  state  waiting  de¬ 
pend  on  the  scheduler  and  on  the  states  of  the  other 
users  in  the  system. 

This  system  was  originally  devised  as  a  very  sim¬ 
ple  model  for  the  behavior  of  people  placing  phone 
calls:  N  is  the  number  of  people,  M  is  the  maximum 
number  of  calls  that  can  be  active  at  the  same  time, 
and  Mo  is  the  minimum  number  of  new  calls  that  the 
phone  company  guarantees  to  be  able  to  connect  in  a 
time  unit.  The  transition  from  idle  to  requesting  cor¬ 
responds  to  the  act  of  lifting  the  handset  to  place  a 
call;  the  transition  from  using  to  idle  corresponds  to 
hanging  up  at  the  end  of  the  call.  The  transitions  out 
of  the  requesting  state  model  the  acts  of  either  getting 
the  connection  or  of  hanging  up  upon  hearing  the  busy 
signal.  Our  intended  specification  is  as  follows: 

Reql:  For  any  user,  the  long-run  fraction  of  requests 
that  are  granted  is  at  least  bo,  for  some  specified 
0  <  b0  <  1. 

We  will  attempt  to  encode  this  specification  in  the 
probabilistic  temporal  logic  pCTL,  derived  from  CTL 
by  introducing  the  probability  operator  P  [17,  7,  18]. 
The  operator  P  can  be  used  to  express  bounds  on 
probabilities,  and  it  is  syntactically  similar  to  a  path 
quantifier.  If  <j>  is  a  linear-time  temporal  logic  formula, 
then  P  >b04>  holds  at  a  state  s  iff  the  probability  that 
a  system  behavior  from  s  satisfies  <f>  is  at  least  bo  (the 
cases  for  <,  <,  >  are  analogous). 

As  the  situation  is  symmetrical  for  all  users,  let 
us  concentrate  on  the  first  user.  Let  I±,  Ri,  U\  be 
atomic  formulas  representing  the  fact  that  user  1  is  at 
idle,  requesting  or  using,  respectively.  It  might  seem 
plausible  at  first  to  encode  the  requirement  Reql  with 
the  following  pCTL  formula: 

An(i?!  -»  P>&0(i?i  UUi))  .  (1) 

Let  us  analyze  this  formula.  Call  any  state  that  satis¬ 
fies  R\  an  R\ -state.  Subformula  P>;,0(i?i  UU\)  holds 
at  an  Ri -state  iff  the  request  of  user  1  at  that  state 
will  be  granted  with  probability  at  least  bo,  regardless 
of  the  policy.  By  definition  of  A  and  □,  specification 
(1)  holds  for  shared-res  iff  every  reachable  R\ -state 
satisfies  P>60(i?i  UUi). 
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The  problem  with  specification  (1)  is  that  there  are 
many  reachable  R\ -states  in  the  system:  in  some  of 
them,  few  users  are  accessing  the  resource;  in  others, 
the  resource  is  fully  utilized.  Clearly,  the  probabil¬ 
ity  that  the  first  user  gets  access  to  the  resource  from 
these  latter  f?i-states  is  0.  Thus,  as  long  as  there 
is  a  reachable  R\ -state  in  which  M  users  are  already 
using  the  resource,  specification  (1)  will  not  be  sat¬ 
isfied,  regardless  of  the  long-run  average  probability 
with  which  the  first  user  succeeds  in  accessing  the  re¬ 
source. 

More  generally,  the  problem  is  that  the  temporal 
logics  presented  in  [17,  2,  7,  13]  can  specify  the  proba¬ 
bility  with  which  behaviors  satisfy  temporal  formulas 
from  given  states,  but  they  do  not  take  into  account 
the  long-run  probability  of  being  in  those  states. 

A  similar  argument  applies  to  the  specification  ap¬ 
proaches  based  on  process  algebras,  simulation  rela¬ 
tions  and  testing  preorders.  While  these  approaches 
can  characterize  many  probabilistic  properties  of  inter¬ 
est,  they  do  not  take  into  account  the  long-run  proba¬ 
bility  of  being  in  given  system  states.  Indeed,  an  exam 
of  the  verification  algorithms  that  have  been  proposed 
to  check  these  relations  in  purely  probabilistic  systems 
confirms  that  the  computation  of  steady-state  proba¬ 
bility  distributions  is  not  among  the  tasks  performed 
by  the  algorithms. 

4  Specification  of  Long-Run  Average 
Properties 

The  specification  method  we  present  is  based  on 
the  concept  of  experiments.  An  experiment  is  simply 
a  finite  deterministic  automaton,  with  a  distinguished 
set  of  initial  states  and  some  additional  labelings.  Ex¬ 
periments  describe  behavior  patterns  of  interest,  and 
are  applied  to  a  TPS  by  forming  the  synchronous  com¬ 
position  between  the  experiment  and  the  TPS.  Each 
time  an  experiment  is  performed,  it  yields  an  outcome, 
related  either  to  the  success,  or  to  the  duration,  of  the 
experiment.  Accordingly,  we  distinguish  two  types  of 
experiments:  P- experiments,  to  measure  discrete  out¬ 
comes,  and  D- experiments,  to  measure  durations. 

Definition  4  (experiment)  An  experiment  4/  = 
(V,E,Er,Vin,  A)  is  a  labeled  graph  ( [V,E ),  with  set 
of  vertices  V  and  set  of  edges  E  C  V  x  V,  and  with 
the  following  additional  components: 

•  A  set  Vin  C  V  of  initial  vertices. 

•  A  set  Er  C  E  —  {(v,v)  \  v  £  V}  of  reset  edges. 


•  A  labeling  A  that  assigns  to  each  u  £  V  a  first- 
order  formula  A (u)  over  the  state  variables  V. 

For  all  u  £  V,  we  denote  by  dst(u)  =  {v  £  V  |  (u,  v)  £ 
E}  the  set  of  vertices  that  can  be  reached  in  one  step 
from  it,  and  we  require  u  £  dst(u).  The  labeling  of  the 
vertices  must  be  deterministic  and  total.  Specifically, 
the  following  formulas  must  be  valid  (i.e.  true  in  any 
type-consistent  variable  interpretation): 

V vEVi„^(V)- 

2-  Vvedst(u)X(v)’  forallueP. 

3.  -i[A(tq)  A  A(n2)],  for  all  Vi,v2  £  Vin. 

4.  -i[A(vi)  AA(v2)L  for  all  u  £  V,  v±,v2  £  dst{u).  I 

When  the  synchronous  composition  between  an  ex¬ 
periment  and  a  TPS  II  is  formed,  the  vertex  labels 
A  of  4/  are  used  to  synchronize  the  transitions  of  4/  and 
II.  The  fact  that  u  £  dst(u)  for  all  u  £  V  ensures  that, 
if  the  variable  assignment  does  not  change,  the  exper¬ 
iment  remains  at  the  same  vertex.  Each  time  a  reset 
edge  is  traversed,  we  say  that  the  experiment  ends, 
so  that  the  number  of  reset  edges  traversed  along  a 
behavior  indicates  how  many  experiments  have  been 
completed. 

In  a  P-experiment,  we  associate  with  each  reset 
edge  an  outcome:  a  real  number  representing  a  reward 
earned  when  the  experiment  is  ended. 

Definition  5  (P-experiment)  A  P-experiment  is 
an  experiment  in  which  each  reset  edge  ( u,v )  £  Er 
is  labeled  with  an  outcome  j(u,v)  £  1R.  I 

Often,  we  are  interested  in  experiments  whose  out¬ 
come  is  binary:  they  can  either  succeed  or  fail.  In  this 
case,  we  associate  outcome  1  with  the  reset  edges  that 
represent  a  successful  completion  of  the  experiment, 
and  outcome  0  with  those  representing  failures. 

In  a  D-experiment,  we  specify  a  set  of  timed  ver¬ 
tices:  the  outcome  of  a  D-experiment  is  equal  to  the 
total  time  spent  at  timed  vertices. 

Definition  6  (D-experiment)  A  D-experiment  is 
an  experiment  with  a  distinguished  nonempty  subset 
Vt  C  V  of  timed  vertices.  I 

Example  1  (average  success  in  SHARED-RES)  In 
Figure  2  we  present  a  P-experiment  that  can  be  used 
to  express  specification  Reql.  If  user  1  proceeds  from 
requesting  to  using,  the  outcome  is  1 ;  if  user  1  proceeds 
from  requesting  to  idle,  the  outcome  is  0.  Specification 
Reql  can  be  encoded  by  requiring  the  long-run  average 
outcome  of  the  experiment  to  be  at  least  bo-  I 
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Figure  2:  P-experiment  ftfie9l,  for  the  specification  of 
Requirement  Reql  of  shared-res.  All  vertices  are  ini¬ 
tial;  the  reset  edges  are  drawn  as  dashed  lines.  We  omit¬ 
ted  the  self-loops  and  the  edges  that  are  never  followed 
by  system  shared- res. 


4.1  Long-Run  Average  Outcome  of  Ex¬ 
periments 

To  use  experiments  for  the  specification  of  long- 
run  average  properties  of  systems,  we  need  to  compose 
them  with  the  system  and  to  define  their  long-run  av¬ 
erage  outcome.  The  synchronous  composition  with  a 
system  is  defined  as  follows. 


Definition  7  (product  of  TPS  and  experiment) 

Given  a  TPS  II  =  (S,  A,p ,  Sin,  time,  I)  and  an  exper¬ 
iment  ft  =  (y,E,Er,Vin,  A),  we  define  their  product 
MDP  11*  =  II  ®  ft  =  (S,  A,  p,  r,  w )  as  follows: 

•  S  =  {( s,u )  |  s  |=  A(w)},  where  s  |=  \{u)  holds  iff 
the  formula  A (u)  is  true  in  the  variable  interpre¬ 
tation  I(s)  corresponding  to  s. 

•  For  all  {s,u)  £  S,  we  let  A({s,u))  =  A(s). 


•  For  all  states  ( s,u),{t,v )  £  S  and  actions  a  £ 
A(s),  the  transition  probabilities  and  the  label¬ 
ings  r,  w  are  defined  as  follows. 

Let  vtu£V  be  the  unique  vertex  such  that  t  f= 
A(u* )  and  ( u ,  u* )  £  E.  Intuitively,  if  II  is  at  s  and 
ft  at  u,  and  II  takes  a  transition  to  t,  ft  will  take 
a  transition  to  u* .  We  define 


P(s,u){t,v){a )  —  ^ 
w({s,u),  ( t,v ))  =  | 


Pst  (a)  if  v  =  u* 

0  otherwise. 
1  if  ( u ,  v )  £  Er 
0  otherwise. 


If  ft  is  a  P-experiment,  we  define 


r{{s,u) 


if  (u,  v)  £  Er 
otherwise. 


If  >t  is  a  D-experiment,  we  define 


r{{s,u) 


time(s,  a) 
0 


if  u  £  Vt 
otherwise. 


Since  experiments  are  total  and  deterministic,  to  each 
s  £  S  corresponds  a  unique  vertex  u  £  V*n  such  that 
( s,u )  £  S.  We  denote  this  unique  vertex  by  Vi„(s), 
for  s  £  S.  I 

In  the  product  MDP,  the  sum  Et=o*(^'^+i) 
indicates  how  many  experiments  have  been  completed 
in  the  first  n  steps  of  a  behavior.  Similarly,  the  sum 
r{Xk,Yk,Xk+ 1)  indicates  the  total  outcome  for 
the  first  n  steps.  Hence,  we  can  define  the  n-stage 
average  outcome  of  experiments  as  follows. 

Definition  8  (n-stage  average  outcome)  Given 
a  behavior  u  of  a  product  between  a  TPS  and  an  ex¬ 
periment,  we  define  the  n-stage  average  outcome  of  c o 
by 

71  —  1 

Ylr(xk,yk,xk+1) 

=  WTTT - • 

y^w(xk,xk+i) 

k= 0 

The  argument  ui  in  ‘Hn{^1)  is  simply  a  reminder  that 
T-Ln  is  a  random  variable,  i.e.  a  measurable  function  of 
the  behavior.  I 

The  long-run  average  outcome  of  the  experiment 
on  a  behavior  ui  is  thus  related  to  lim,,-^  T~ln  (u) ,  or 
better  to  lim  inf %n (w)  and  limsup„_>oc  V.n(to), 
since  under  certain  policies  the  sequence  {Hn{^)}n> o 
may  oscillate  for  n  — >  oo. 

4.2  Extending  Temporal  Logics  with  Ex¬ 
periments 

To  enable  the  specification  of  long-run  average 
properties  of  systems,  we  extend  probabilistic  or  non- 
probabilistic  versions  of  the  temporal  logics  CTL  and 
CTL*  with  experiments.  The  extensions  are  obtained 
by  introducing  two  new  operators  P  and  D,  which  are 
used  to  express  bounds  on  the  average  long-run  out¬ 
come  of  experiments  from  given  starting  states.  For 
txie  {<,<,>,>}  we  introduce  the  following  state  for¬ 
mulas  in  the  logics. 

•  If  ft  is  a  P-experiment  and  a  £  It,  then  P,x|0(\t) 
is  a  state  formula. 

•  If  ^  is  a  D-experiment  and  a  £  1R,  then  D><|0(ft) 
is  a  state  formula. 

Intuitively,  PMa($)  holds  at  a  state  s  if,  under  all 
policies,  a  behavior  that  performs  infinitely  many  ex¬ 
periments  yields  a  long-run  average  outcome  that  is 
cx  a  with  probability  1.  The  semantics  of  operator 
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D  is  analogous.  To  define  the  semantics  precisely,  we 
need  an  auxiliary  predicate  I  that  characterizes  the 
behaviors  on  which  the  long-run  outcome  of  an  exper¬ 
iment  is  well-defined. 

Definition  9  (predicate  J)  For  a  behavior  w  of  a 
product  between  a  TPS  and  an  experiment,  the  truth 
value  of  predicate  I  is  defined  by  u>  |=  I  iff 

OO 

J2HXk,Xk+1)  +  \r(Xk,Yk,Xk+1)\\  =oo.  (2) 

k= 0 

Thus,  I  holds  either  if  ui  spends  an  infinite  amount 
of  time  at  timed  vertices,  or  if  uj  performs  infinitely 
many  experiments.  It  is  not  difficult  to  show  that  the 
truth-set  of  I  is  measurable.  I 

The  operators  P  and  D  specify  bounds  on  the  long- 
run  average  outcome  only  for  behaviors  that  satisfy 
predicate  I,  i.e.  behaviors  that  perform  infinitely  many 
experiments,  or  that  spend  an  infinite  amount  of  time 
at  timed  vertices.  The  reason  is  that  the  behaviors 
that  do  not  satisfy  I  eventually  cease  to  be  involved 
in  the  active  part  of  the  experiment,  so  that  the  limit 
lim^oo  Wn(w)  represents  a  short-term  average  out¬ 
come,  rather  than  a  long-run  one.  This  short-term 
average  outcome  can  be  influenced  by  statistical  fluc¬ 
tuations.  This  will  be  illustrated  later  in  Example  3. 

The  semantics  of  operators  P  and  D  is  defined  in 
terms  of  H.  and  I  as  follows. 

Definition  10  (semantics  of  P  and  D)  Given  a 
TPS  II  =  (S,  A,p,  Sin,  time,  I)  and  an  experiment  = 
(V,  E,  Er,  Vin,  A),  let  n®  =  be  the  product  MDP. 

First,  we  define  the  semantics  of  P,  D  on  n®.  For 
a  state  ( s,v )  of  11®,  a£l  and  xi  €  {>,  >},  we  have 
that  (s,v)  |=  PixiaW  (resp.  (s,v)  |=  if,  for 

all  policies  p, 

Pr7 '  ,  (I  ->  liminf  7t„(w)  ex  a)  =  1  .  (3) 

\b’u/  '  n— >oo  ' 

The  definition  of  ( s,v )  |=  Pixo('P)  (resp.  ( s,v )  |= 
DMq(4>))  for  x  e  {<,  <}  is  analogous. 

For  all  states  s  £  S  of  II,  the  semantics  of  P  and  D 
is  then  defined  by 

S  |=  S|x|0(\I/)  iff  {s,Vin(s))  |=  S[x]0(\l/)  , 
where  5  is  one  of  P,  D  and  xe  {<,  <,  >,  >}.  I 

Example  2  (specifying  the  call  throughput  of 

shared-res)  Using  the  experiment  of  Figure  2, 
we  can  specify  the  call  throughput  requirement  Reql 
of  system  SHARED- RES  using  the  formula  4>Req l  : 

P>6cW-  ■ 


ILp:  51 


Figure  3:  Product  MDP  II®  =  (S',  A,p,w,r),  corre¬ 
sponding  to  a  gambling  system.  It  is  A(«o)  =  {gamble}, 
A(s3)  =  {idle,  gamble-again}. 


The  following  example  illustrates  the  necessity  of 
excluding  from  the  considerations  the  behaviors  on 
which  predicate  I  does  not  hold. 


Example  3  Consider  a  gambling  system  II  which, 
upon  each  gamble,  returns  a  gain  of  ±1  with  equal 
probability.  After  each  gamble,  the  player  can  either 
be  idle,  or  gamble  again.  Figure  3  depicts  the  MDP  re¬ 
sulting  from  the  product  of  the  system  with  an  exper¬ 
iment  that  measures  the  average  gain  per  gamble. 
Our  specification  for  this  system  is  P>_o.3(fI').  This 
formula  specifies  that  the  long-run  average  outcome 
of  the  experiment  (i.e.,  the  long-run  average  gain  per 
gamble)  should  be  at  least  —0.3. 

Along  a  behavior  that  gambles  infinitely  often,  the 
long-run  average  gain  per  gamble  is  0  with  probabil¬ 
ity  1.  Thus,  by  (2)  and  (3),  we  have  so  (=  P>-o.3(fIf), 
which  agrees  with  our  intuitive  understanding  of  long- 
run  average  outcome. 

On  the  other  hand,  the  short-term  average  outcome 
might  be  different  from  0.  In  particular,  let  ff  be 
the  policy  that  prescribes  to  gamble  exactly  4  times 
starting  from  so,  and  then  be  idle  forever.  Clearly, 

Prelim  inf  Hn(w)  =  -4)  =  1/16  . 

Hence,  if  we  dropped  the  restriction  to  behaviors  sat¬ 
isfying  I  in  (3),  we  would  have  so  ^  P>-o.3(^)- 

This  example  shows  that  omitting  predicate  I  from 
(3)  would  alter  drastically  the  semantics  of  P  and  D, 
making  them  useless  for  the  specification  of  long-run 
average  properties.  I 
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4.3  Specification  Languages  and  Property 
Classification 

The  specification  language  we  propose  in  this  pa¬ 
per  embeds  experiments  into  branching-time  temporal 
logic.  This  might  seem  a  poor  fit,  since  temporal  op¬ 
erators  and  experiments  cannot  be  nested  arbitrarily. 
Two  reasons  motivated  our  proposal.  First,  if  the  state 
space  of  the  system  under  analysis  is  not  strongly  con¬ 
nected,  the  long-run  average  outcome  of  experiments 
can  have  different  values  when  measured  from  differ¬ 
ent  states.  Temporal  logic  enables  us  to  specify  the 
set  of  states  from  which  to  measure  it. 

Moreover,  operators  P  and  D  can  be  combined  with 
other  probabilistic  extensions  of  CTL  and  CTL*  to 
yield  powerful  languages  for  the  specification  of  a  wide 
range  of  correctness,  reliability  and  performance  prop¬ 
erties.  The  logics  GPTL  and  GPTL*  ( generalized 
probabilistic  temporal  logic )  presented  in  [12]  combine 
the  operators  P  and  D  with  the  operators  P  and  D. 
The  operator  P,  mentioned  in  Section  3,  can  express 
bounds  on  the  probability  with  which  linear-time  tem¬ 
poral  logic  formulas  hold;  operator  D,  presented  in 
[13],  can  express  bounds  on  the  expected  amount  of 
time  required  to  reach  given  subsets  of  states.  We 
call  the  properties  that  can  be  expressed  by  operators 
P  and  D  single-event  properties,  to  emphasize  the  fact 
that  they  involve  the  occurrence  of  a  single  event  (sat¬ 
isfying  a  linear-time  temporal  formula,  or  reaching  a 
subset  of  states) . 

The  duality  between  long-run  average  properties 
and  single-event  properties  has  been  mentioned  in  the 
introduction:  long-run  average  properties  refer  to  time 
averages,  while  single-event  properties  refer  to  ensem¬ 
ble  averages.  This  duality  is  reflected  in  the  different 
types  of  questions  that  arise  during  verification. 

The  model  checking  of  GPTL*  formula  P^af>  in¬ 
volves  the  construction  of  the  product  between  the 
TPS  and  the  deterministic  Rabin  automaton  for  <f»  or 
-op  [13].  Deterministic  Rabin  automata  are  strictly 
more  expressive  than  deterministic  Biichi  automata 
[22].  Nonetheless,  for  the  sake  of  simplicity  we  con¬ 
sider  an  algorithm  that  computes  the  product  with  a 
deterministic  Biichi  automaton  instead.  Such  an  al¬ 
gorithm  can  be  used  for  the  subclass  of  formulas  that 
can  be  encoded  as  deterministic  Biichi  automata. 

In  the  resulting  product  structure,  to  decide 
whether  s  \=  P^aP,  we  essentially  need  to  answer  the 
question: 

What  is  the  probability  of  visiting  the  accept¬ 
ing  states  infinitely  often? 

Consider  now  the  specification  PMo(f).  In  the 
structure  resulting  from  the  product  between  the  sys¬ 


tem  and  the  experiment,  the  outcomes  are  associated 
with  the  reset  edges  of  the  experiment.  The  long-run 
average  outcome  depends  on  the  relative  frequency 
with  which  we  traverse  these  edges.  Hence,  to  decide 
whether  s  f=  PMa($),  we  essentially  need  to  answer 
the  question: 

If  we  traverse  the  reset  edges  infinitely  often, 
with  what  relative  frequency  do  we  traverse 
them? 

Hence,  we  see  that  there  is  a  direct  correspondence  be¬ 
tween  deterministic  Biichi  automata  and  experiments: 
to  the  accepting  states  of  Biichi  automata  correspond 
the  reset  edges  of  experiments.  P-experiments  could  in 
fact  be  defined  as  deterministic  stutter-invariant  edge- 
Biichi  automata  with  outcomes  associated  to  the  ac¬ 
cepting  edges.  The  duality  between  the  verification 
questions  corresponding  to  P  and  P  illustrates  the  du¬ 
ality  between  the  classes  of  properties  expressed  by 
these  operators. 

Given  the  duality  between  single-event  and  long- 
run  average  properties,  it  is  natural  to  ask  whether 
there  are  families  of  systems  whose  specifications  are 
better  captured  by  a  particular  class  of  properties. 
While  there  is  no  absolute  answer  to  this  question, 
long-run  average  properties  seem  to  be  better  suited  to 
the  study  of  systems  that  have  an  interesting  steady- 
state  behavior.  Examples  of  such  systems  are  commu¬ 
nication  networks  and  distributed  systems  in  which  no 
irreversible  failures  can  occur. 

Single-event  properties  are  instead  suited  to  the 
study  of  systems  that  have  uninteresting  steady-state 
behavior.  Several  system  models  used  for  reliability 
analysis  fall  in  this  category.  These  models  can  often 
reach  an  irreversible  “failure”  state,  in  which  case  the 
steady-state  distribution  is  degenerate.  The  proper¬ 
ties  of  interest  are  related  to  the  probability  or  ex¬ 
pected  time  to  reach  the  failure  state  from  given  sets 
of  states. 

5  Verification  of  Long-Run  Average 
Properties 

In  this  section,  we  present  a  model-checking  al¬ 
gorithm  to  determine  the  truth  value  of  formulas  of 
the  form  Pxa('P)  and  Dxia(’P)  at  all  states  of  the 
product  between  a  TPS  and  an  experiment,  where 
txe  {<,<,>,>}  and  a  £  IR.  The  algorithm  relies 
on  new  results  on  the  theory  of  Markov  decision  pro¬ 
cesses,  and  on  a  connection  with  optimization  prob¬ 
lems  for  semi-Markov  decision  processes  [12].  The  cor¬ 
rectness  proof  for  the  algorithms  is  fairly  complex,  and 
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has  been  presented  in  [12,  §6]:  here,  we  will  only  pro¬ 
vide  a  partial  and  informal  justification. 

We  note  that  for  purely  probabilistic  systems,  it  is 
possible  to  use  a  simpler  algorithm,  based  on  the  com¬ 
putation  of  the  steady-state  distribution  of  Markov 
chains  [12,  §6]. 

To  present  the  algorithms,  we  need  the  preliminary 
notions  of  sub-MDP  and  end  component  [12]. 

5.1  End  Components 

End  components  are  the  analogous  concept  in 
Markov  decision  processes  of  the  closed  recurrent 
classes  of  Markov  chains  [20]:  they  represent  the  set 
of  states  and  actions  that  can  be  repeated  infinitely 
often  along  a  behavior  with  non-zero  probability. 

Definition  11  (sub-MDPs  and  end  compo¬ 
nents)  Given  an  MDP  II  =  (S,A,p),  a  sub-MDP 
is  a  pair  ( C,D ),  where  CCS  and  D  is  a  function 
that  associates  with  each  s  g  C  a  set  D(s)  C  A{s)  of 
actions.  A  sub-MDP  ( C,D )  is  an  end  component  if 
the  following  conditions  hold: 

•  Closure:  for  all  s  g  C,  a  g  D(s),  and  t  £  S,  if 
pst(a)  >  0  then  t  g  C. 

•  Connectivity:  Let 

E  =  {(s,  t)  g  C  x  C  \  3a  g  D(s)  .pst(a)  >  0}  ; 

then,  the  graph  (C,  E)  is  strongly  connected. 

We  say  that  an  end  component  ( C,D )  is  contained  in 
a  sub-MDP  (C',D')  if 

{(s,  a)  |  s  g  C  A  a  g  D(s)} 

C  {(s,  a)  |  s  g  C'  A  a  g  D'(s)}  . 

We  say  that  an  end  component  ( C ,  D)  is  maximal  in  a 
sub-MDP  ( C'  ,D ')  if  there  is  no  other  end  component 
( C",D ")  contained  in  ( C1 ,  D ')  that  properly  contains 
( C ,  D ).  We  denote  by  maxEC (C1 ,  D')  the  set  of  max¬ 
imal  end  components  of  (C1 ,  D').  I 

It  is  not  difficult  to  see  that,  given  a  sub-MDP 
( C ,  D),  the  set  maxEC (C,  D)  can  be  computed  in  time 
polynomial  in  |Cj  +  J2sec  l-^(s)l  using  simple  graph 
algorithms;  an  algorithm  to  do  so  is  given  in  [12,  §3]. 

Given  a  behavior  oj.  let 

OO 

cw  =  {s  1 3  k  .  Xk  =  s) 

OO 

Du  (s)  =  {a  1 3  k  .  Xk  =  s  A  Yk  =  a }  , 


where  3  k  stands  for  “there  are  infinitely  many  dif¬ 
ferent  fc’s”.  The  sub-MDP  (CU,DU)  corresponds  to 
the  states  and  actions  that  are  repeated  infinitely  of¬ 
ten  along  u >.  The  proof  of  the  following  result  can  be 
found  in  [12]. 

Theorem  1  (fundamental  theorem  of  end  com¬ 
ponents)  For  all  s  €  5  and  all  policies  p, 

Pr ^((Cu,  Du)  is  an  end  component )  =  1. 

5.2  The  Model-Checking  Algorithm 

Consider  an  MDP  II  =  n0  ®  fh  =  (S,  A,p,r,w) 
resulting  from  the  synchronous  product  of  a  TPS  IIo 
with  experiment 

We  define  the  threshold  outcomes  T^  and  Ts  of  II 
to  be,  respectively,  the  maximum  and  minimum  val¬ 
ues  of  the  long-run  average  outcome  of  that  can  be 
attained  with  non-zero  probability  under  some  policy 
starting  from  state  s. 

Definition  12  (threshold  outcomes)  For  all  s  € 

S,  we  define  the  threshold  outcome  T^  by 

Tj~  =sup{ae!R  )  3f?.Pr!? (/Alimsup W„(w)>o)  >  0}  . 

n—>  oo 

The  threshold  outcome  T  is  defined  similarly.  We 
use  the  conventions  sup  0  =  — oo,  inf  0  =  +oo.  I 

The  truth  value  of  P^a^)  and  D,x|0(\I>)  at  all  s  €  S 
can  be  computed  by  comparing  the  threshold  out¬ 
comes  with  a,  as  stated  by  the  following  theorem. 

Theorem  2  For  S  €  {P,  D}  and  cxi  g  {<,<},  we 
have 

s  |=  SMo(^)  iff  T+  xi  a  . 

A  similar  result  holds  for  txi  g  {>,  >}  and  Ts  . 

The  following  algorithm  computes  the  threshold 
outcomes.  It  uses  Algorithms  2,  3  and  4,  which  will 
be  described  later. 

Algorithm  1  (threshold  outcomes) 

Input:  MDP  II  =  (5,  A,p,  r,  w ). 

Output:  T+  and  Ts  for  all  s  €  S. 

Method: 

1.  Compute  the  labelings  W  and  R,  defined  by 

W(s,a)  =  ^psf(a)u;(g,t) 
tes 

R(s,a)  =  ^ ~2pst(a)r(s,a,t ) 
tes 
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for  all  s  £  5  and  a  £  A(s).  Denote  by  II'  = 
( S,A,p,W,R )  the  MDP  obtained  by  replacing  r 
and  w  with  R  and  W,  respectively.  The  purpose 
of  this  step  is  to  simplify  the  notation. 

2.  Let  {(5i,^4i)jtf,.,(5n,i4n)}  =  maxEC(S,A). 
For  each  1  <  i  <  n,  construct  an  MDP  Lb  = 
(Si,Ai,p'l,Ri,Wi),  where  pl,  Ri,  B7*  are  the  re¬ 
strictions  of  p,  R,  W  to  (Si,  Ai).  For  s  £  S,  let 

Ms  =  {i  £  [l..n]  |  Si  reachable  in  II'  from  s}  . 

By  Theorem  1,  for  a  behavior  lo  there  is  with 
probability  1  an  i  £  [l..n]  such  that  (CU,DU)  C 
(Si,Ai),  i.e.  lj  is  eventually  confined  to  ( Si,Ai ). 
If  a  behavior  cu  satisfies  I  (the  case  of  inter¬ 
est),  the  limit  liminfn_s.oc  T~Ln(oA}  does  not  de¬ 
pend  on  any  initial  prefix  of  uj  (and  similarly  for 
lirnsup**^  nn M). 

Let  T^b,  Tt  i  be  the  threshold  outcomes  associ¬ 
ated  with  state  t  £  Si,  computed  with  respect 
to  the  MDP  Ilj.  Since  for  1  <  i  <  n  each  11* 
is  strongly  connected,  it  is  possible  to  prove  that 
T^  and  Tt  i  do  not  depend  on  t  £  Si,  so  that 
we  can  write  simply  T+  and  T*  .  From  the  above 
considerations  it  follows  that 

T(j~  =  max  T+  Ts  =  min  T*  .  (4) 

i£Als  i£Als 

Hence,  to  solve  the  model-checking  problem  it  suf¬ 
fices  to  compute  T+  and  Ti  for  all  1  <  i  <  n. 

3.  Compute  the  set 

C,  =  {i  £  [l..n]  J  3s  £  Si .  3a  £  A*(s)  . 

[I?j(s,  a)  >  0  V  Wi(s,  a)  >  0]}  . 

If  i  ^  £,  the  behaviors  that  are  confined  to  n*  do 
not  satisfy  I.  Hence,  for  i  £  {1, . . . ,  n}  —  £  let 

T^  =  -oo  T*“  =  +oo  .  (5) 

4.  Transform,  using  Algorithm  2,  each 
n*  =  (Si,Ai,p\Ri,Wi),  i  £  C,  into  an  MDP 
n*  =  (Si,  Aijp1 ,  Ri,Wi)  such  that  the  predicate  I 
holds  with  probability  1,  for  all  policies. 

Let  T^,  Ti  be  the  threshold  outcomes  computed 
on  the  MDPs  n*,  for  i  £  £.  For  i  £  C,  it  can  be 
shown  that: 

T+  =  $ t  Ti  =  T7  .  (6) 


5.  Compute,  using  Algorithm  3,  the  sets 

K~  =  {i  £  £  |  T*“  =  +oo}  (7) 

K+  =  {i£C  |  Tt  =  +oo}  .  (8) 

6.  For  everyjjolicy,  predicate  I  holds  with  probabil¬ 
ity  1  on  n*,  for  i  £  £.  This  enables  us  to  disre¬ 
gard  predicate  I  when  working  on  n*,  leading  to 
a  connection  between  the  computation  of  T*  ,  T* 
and  the  solution  of  an  optimization  problem  for 
semi-Markov  MDPs  [12]. 

Algorithm  4  exploits  this  connection  to  compute 
T^  (resp.  T*  )  for  all  i  $  K+  (resp.  i  $  K  ).  The 
threshold  outcomes  T]j~  and  Ts  in  n  at  all  s  £  S 
can  be  computed  by  (4),  (5),  (6),  (7),  (8).  I 

5.3  Eliminating  non-I  Behaviors 

Algorithm  2  transforms  an  MDP  into  a  related 
MDP  on  which  predicate  I  holds  with  probability  1. 
The  idea  of  the  algorithm  is  the  following.  From  The¬ 
orem  1,  predicate  I  can  be  false  with  positive  proba¬ 
bility  iff  the  MDP  contains  a  (reachable)  end  compo¬ 
nent  all  of  whose  state- action  pairs  have  R  =  W  =  0. 
By  eliminating  these  end  components,  we  can  insure 
that  I  holds  with  probability  1.  The  offending  end 
components  are  eliminated  by  collapsing  each  of  them 
to  a  single  state,  and  by  removing  all  the  actions  be¬ 
longing  to  the  end  component.  Since  the  state- action 
pairs  that  are  collapsed  have  R  =  W  =  0,  it  is  possible 
to  prove  that  the  transformation  leaves  the  threshold 
outcomes  unchanged,  as  stated  in  (6). 

Algorithm  2  (I-transformation) 

Input:  MDP  n  =  (5,  A,p,  R,  W). 

Output:  MDP  5  =  (S,  A,p,  R,  W). 

Method:  For  each  s  £  S,  let 

D(s )  =  {a  £  A(s)  |  R(s,  a)  =  0  A  W(s,  a)  =0} 

be  the  set  of  actions  associated  with  s  that  have  R 
and  TF-labels  equal  to  0.  Also,  let 

{(Bi,Di), . . . ,  (Bn,Dn)}  =  maxEC(S,D )  . 

The  MDP  n  is  obtained  from  n  by  collapsing  each  EC 
(Bi,  Di)  into  a  single  state  s*,  for  1  <  i  <  n.  The  new 
set  of  states  is  given  by  5  =  5U  {5i , . . . ,  s„}  —  (J"=1  Bi . 
The  action  sets  are  then  defined  as  follows. 

•  For  s  £  S  -  Ufer  Bi,  A(s)  =  {{s,  a)  \  a  £  A(s)}. 
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•  For  1  <  i  <  n: 

A(si)  =  {{s,  a)  J  s  £  Bj  A  a  £  A(s)  -  D(s)}  . 

For  s  £  S,  t  £  S  —  U"=i  Bi,  1  <  i  <  n  and  (u,  a)  € 
A(s),  the  transition  probabilities  and  the  labelings  R, 
W  are  defined  by 

Pst « u,a))=  put ( a )  ps~  {{u,a))=  ^  Put  (a) 

tEBi 

R(s,  (u,a))  =  R(u,a)  W(s,  (u,a))  =  W(u,a)  .  I 

5.4  Computation  of  Convergent  MDPs 
Algorithm  3  (convergent  MDPs) 

Input:  Set  £. 

Output:  Sets  K  and  K+ ,  defined  as  in  (7),  (8). 
Method:  For  each  i  £  C: 

•  For  s  £  Si,  let  B(s)  =  {a  £  Aj(s )  |  Wi(s,a)  =  0} 
be  the  set  of  actions  having  Wi  =  0.  Let 

{(Ci,Di),...,(Cm,Dm)}  =  maxECiSi,  B)  . 

Then,  i  £  K+  iff  there  are  j  £  [1  ..m\,  s  £  Cj  and 
a  £  Dj{s)  such  that  Ri(s,  a)  >  0. 

•  i  £  K  iff  both  of  the  following  conditions  hold: 

—  for  all  s  £  Si  and  a  £  Ai(s),  Wi(s,a)  =  0; 

—  there  are  s  £  Si  and  a  £  Ai(s )  such  that 
Ri{s,  a)  >  0.  I 

5.5  Computation  of  Threshold  Outcomes 

The  following  algorithm  computes  threshold  out¬ 
comes  on  strongly  connected  MDPs  on  which  predi¬ 
cate  I  holds  with  probability  1. 

Algorithm  4  (computation  of  T^,  Ti  )  If  i  £ 

K  ,  consider  the  following  linear  programming  prob¬ 
lem,  with  variables  A,  {hs}seg  :  Maximize  A  subject 
to 

hs  <  Ri(s,a)  -  A Wi(s,a)  +  ^?si(a)hf  (9) 

test 

for  all  s  £  Si  and  a  £  A,(s).  Then,  all  optimal  solu¬ 
tions  to  the  problem  share  the  same  value  for  A,  and 
this  value  is  equal  to  T,  [12]. 

For  i  £  K+ ,  the  outcome  Ti  can  be  computed 
by  solving  a  similar  linear  programming  problem,  in 
which  the  direction  of  the  inequality  in  (9)  is  reversed, 
and  A  is  minimized.  I 


5.6  Correctness  and  Complexity 

The  following  theorem  provides  results  on  the  cor¬ 
rectness  and  the  complexity  of  the  model-checking 
procedure. 

Theorem  3  Algorithm  1  correctly  computes  the 
threshold  outcomes,  and  it  has  time- complexity  poly¬ 
nomial  in  /  |Sj  X]sesl^(s)l>  where  l  is  the  length  of 
the  fixed-precision  binary  numbers  used  to  encode  the 
transition  probabilities. 

If  the  labels  of  the  experiment  vertices  are  written 
in  appropriately  restricted  sub-languages  of  first-order 
logic,  the  time-complexity  of  the  verification  process 
is  polynomial  both  in  the  size  of  the  TPS  and  in  the 
size  of  the  experiment. 
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